Search CORE

11 research outputs found

The Search for Irregularly Shaped Clusters in Data Mining

Author: Angel Kuri-Morales
Edwyn Aldana-Bobadilla
Publication venue: 'IntechOpen'
Publication date: 21/01/2011
Field of study

FORECASTING EPIDEMIOLOGICAL BEHAVIOR OF RESPIRATORY DISEASES IN MEXICO CITY USING NEURAL NETWORKS

Author: Angel Fernando
Kuri-Morales
Pablo Kuri-Morales
Publication venue
Publication date: 11/04/2020
Field of study

Abstract: In this paper we describe the application of a Neural Network (NNM) model (multi-layer backpropagation perceptron model) to forecast the number of respiratory deseases in Mexico City as a function of the detected events in 5 neighboring states during the weeks preceding the dates of interest. The model was derived from the data collected by the Dirección General de Epidemiología (DGE) of the Secretaría de Salud, the Mexican department in charge of detection and containment of events of epidemiological category. The accurate forecasting of such incidences allows the DGE to take the adequate previsions which may facilitate the correct supply of medicaments as well as the personnel responsible for the primary attention

CiteSeerX

A Clustering Method Based on the Maximum Entropy Principle

Author: Angel Kuri-Morales
Edwin Aldana-Bobadilla
Publication venue: 'MDPI AG'
Publication date: 01/01/2015
Field of study

Clustering is an unsupervised process to determine which unlabeled objects in a set share interesting properties. The objects are grouped into k subsets (clusters) whose elements optimize a proximity measure. Methods based on information theory have proven to be feasible alternatives. They are based on the assumption that a cluster is one subset with the minimal possible degree of “disorder”. They attempt to minimize the entropy of each cluster. We propose a clustering method based on the maximum entropy principle. Such a method explores the space of all possible probability distributions of the data to find one that maximizes the entropy subject to extra conditions based on prior information about the clusters. The prior information is based on the assumption that the elements of a cluster are “similar” to each other in accordance with some statistical measure. As a consequence of such a principle, those distributions of high entropy that satisfy the conditions are favored over others. Searching the space to find the optimal distribution of object in the clusters represents a hard combinatorial problem, which disallows the use of traditional optimization techniques. Genetic algorithms are a good alternative to solve this problem. We benchmark our method relative to the best theoretical performance, which is given by the Bayes classifier when data are normally distributed, and a multilayer perceptron network, which offers the best practical performance when data are not normal. In general, a supervised classification method will outperform a non-supervised one, since, in the first case, the elements of the classes are known a priori. In what follows, we show that our method’s effectiveness is comparable to a supervised one. This clearly exhibits the superiority of our method

Multidisciplinary Digital Publishing Institute

Directory of Open Access Journals

Pattern Based Lossless Data Compression

Author: Angel Kuri-morales
Autónomo México
Publication venue
Publication date
Field of study

Abstract. In this paper we discuss a method for lossless data compression (LDC) which relies on finding a set of patterns (each of these patterns will be called a metasymbol) in a set of data whose elements (which we will call symbols) are of arbitrary size and which is, itself, also of arbitrary size. This arbitrary data set will be called a message. In order to achieve LDC two things are necessary: a) A method to find the metasymbols and b) A scheme to represent the message as a function of these metasymbols. In the past, LDC has been attempted, among other methods, by using the probability that a given symbol or combination of symbols appears in the message (as in Huffman and PPM encoding schemes) or by keeping a record of the last K symbols in the message’s stream and using references to this record to represent the data (as in the several variations of Lempel-Ziv compression schemes). In both of the aforementioned approaches to LDC the structure of the premium data structure on which the method is based is fixed a priori. Furthermore, the compression ratio of both of these approaches changes even in the presence of similar patterns in the structure of the message. The structure of the metasymbols in our approach, however, does not depend on aprioristic considerations. In fact, the structure of every metasymbol is arbitrary and, in general, different from every other’s one. We compare MLDC to traditional encoding schemes, showing the potential superiority of our method. We also show that Metasymbolic Lossless Data Compression (MLDC) is dependent on the structure of the patterns of symbols and NOT on the symbols and under which conditions MLDC is superior to other LDC schemes. Keywords. Data compression, Losslessness, Encoding, Information theory, Ergodicity.

CiteSeerX